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Abstract 

This paper studies the wireless spectrum sharing between a pair of distributed primary radio (PR) and cognitive 
radio (CR) links. Assuming that the PR link adapts its transmit power and/or rate upon receiving an interference 
signal from the CR and such transmit adaptations are observable by the CR, this results in a new form of feedback 
from the PR to CR, refereed to as hidden PR feedback, whereby the CR learns the PR's strategy for transmit 
adaptations without the need of a dedicated feedback channel from the PR. In this paper, we exploit the hidden 
PR feedback to design new learning and transmission schemes for spectrum sharing based CRs, namely active 
learning and supervised transmission. For active learning, the CR initiatively sends a probing signal to interfere 
with the PR, and from the observed PR transmit adaptations the CR estimates the channel gain from its transmitter 
to the PR receiver, which is essential for the CR to control its interference to the PR during the subsequent data 
transmission. This paper proposes a new transmission protocol for the CR to implement the active learning and the 
solutions to deal with various practical issues for implementation, such as time synchronization, rate estimation 
granularity, power measurement noise, and channel variation. Furthermore, with the acquired knowledge from 
active learning, the CR designs a supervised data transmission by effectively controlling the interference powers 
both to and from the PR, so as to achieve the optimum performance tradeoffs for the PR and CR links. Numerical 
results are provided to evaluate the effectiveness of the proposed schemes for CRs under different system setups. 

Index Terms 

Active learning, cognitive radio, hidden feedback, spectrum sharing, supervised transmission. 

I. Introduction 

Opportunistic spectrum access (OSA) and spectrum sharing (SS) are two basic operation models for 
the secondary radio or so-called cognitive radio (CR) system to operate over a common frequency band 
with an existing primary radio (PR) system. For the OSA model (see, e.g., [0Q), the CR usually deploys a 
spectrum sensing technique to detect the PR transmission on-off status over the frequency band of interest, 
and decides to transmit over this band if the sensing result indicates that the PR is not transmitting with a 
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high probability. In contrast, the SS model (see, e.g., A3), flU) allows the CR to transmit concurrently 
with the PR over the same frequency band, provided that the CR knows how to control its interference to 
the PR such that the resultant PR performance degradation is tolerable. Since SS-based CRs in general 
utilize the spectrum more efficiently than OSA-based CRs, this paper focuses on the SS model for CRs. 

One commonly adopted method for SS-based CRs to protect the PR transmission is via imposing an 
interference temperature constraint (ITC) over the CR transmission, i.e., the CR interference power level 
at each PR receiver must be kept below a prescribed threshold 0, 0, 0, 0. Some important design 
issues related to the ITC -based approach are discussed as follows. First, the effectiveness of the ITC 
to protect the PR transmission needs to be addressed. In and ifTOTl. it has been shown that the ITC 
guarantees an upper bound on the maximum capacity loss of the PR channel due to the CR interference. 
In IfTTTl . an interesting interference diversity phenomenon was discovered, where the average ITC over 
different fading states was shown to be superior over the peak ITC counterpart for minimizing the PR 
ergodic/outage capacity losses. Second, it is pertinent to investigate more efficient methods for the CR to 
protect the PR than that with a fixed ITC. Such methods may exploit additional side information on the 
PR transmissions such as the PR's on-off status IfTOll , Automatic Repeat reQuest (ARQ) feedback lfT2ll . 
channel state information (CSI) IfTOl , |fT3~l , spatial signal space S, EL and frequency power allocation 
[fT5l , in order to set more appropriate interference power levels over time, frequency, or space for CR's 
opportunistic transmission. Thus, conventional ITCs are replaced by the more relevant PR performance 
loss constraints [[Toll . |[T6ll . However, although these new methods are promising to improve the PR and CR 
spectrum sharing throughput, they usually require substantial overheads for implementation as compared 
with the ITC. Third, even implementation of the ITC requires knowledge of the channel gain from the 
CR transmitter to the PR receiver, which is difficult to obtain for the CR without a dedicated feedback 
channel from the PR. If the PR link adopts a time-division-duplex (TDD) mode and thus the channel 
reciprocity holds between PR and CR terminals, the CR-to-PR channel gain can then be estimated by 
the CR from its observed PR signals, assuming prior knowledge of the PR transmit power. However, if 
a frequency-division-duplex (FDD) mode is adopted by the PR (i.e., PR terminal transmits and receives 
over two different frequency bands), channel reciprocity between PR and CR terminals does not hold in 
general. As a result, estimating CR-to-PR channels from the observed PR signals may fail for the CR. 



Motivated by the above discussions, this paper presents a new design paradigm for SS-based CRs, 
which resolves the CR-to-PR channel estimation problem for the CR, and also leads to a more efficient 
spectrum sharing solution than the conventional one with fixed ITCs. The proposed method exploits an 
interesting PR-CR interaction by assuming that the PR deploys certain form of transmit power and/or rate 
adaptations upon receiving an interference signal from the CRo Specifically, suppose that the CR initially 
transmits a probing signal to interfere with the PR receiver, which then sends back a control signal (via 
the PR feedback channel) to the PR transmitter for adapting transmit power and/or rate accordingly; 
finally, the PR transmit adaptations are observed by the CR. Thereby, the CR obtains knowledge on the 
PR deployed strategy for transmit adaptations without the need of a dedicated feedback channel from the 
PR. This implicit form of feedback from the PR to CR is thus named as hidden PR feedback. Since the 
CR initiatively sends a probing signal to interfere with the PR for activating the hidden PR feedback, 
this "active learning" principle is different from existing "passive learning" counterpart (e.g., detecting 
the PR on-off status or estimating the CR-to-PR channel gain via sensing the PR band only) for the 
design of CR systems. However, it should be pointed out that the probing signal from the CR can cause 
a temporary performance degradation of the PR, and thus needs to be properly designed (details will 
be given later in the paper). The use of active learning approach for designing new spectrum sensing 
techniques for OSA-based CRs have been studied in Ifl9l and EUl , while in this paper we apply this 
interesting approach to design new learning and transmission schemes for SS-based CRs. It is worth 
noting that although iteratively adapting transmit power and rate to cope with the co-channel interference 
among users in decentralized communication systems has been studied in the literature (see, e.g., ll2TTl . 
G2ll . ||23ll ), the approach of exploiting the PR transmit adaptations to design new operation schemes for 
the CR is a new contribution of this paper. Based on the hidden PR feedback, this paper proposes two 
new types of operations for SS-based CRs, which are described as follows. 

• Active Learning: By probing the PR with interference and observing its transmit power/rate adapta- 
tions, under certain conditions, the CR is able to estimate the channel gain from its transmitter to the 
PR receiver, which is essential for the CR to control its interference to the PR during subsequent data 

'Under this assumption, this paper considers PR systems that have two-way communications such that one node can send control signals 
to the other node for transmit adaptation. Such PR systems apparently do not apply to one-way communication systems (e.g., the TV 
broadcasting system considered for WRAN |17|), but may find applications in existing cellular-based wireless systems (see, e.g., |18|). 



4 

transmission. We refer to this new scheme for the CR as active learning, to differ it from existing 
passive learning schemes in the literature. 
• Supervised Transmission: With the acquired knowledge on the CR-to-PR channel gain and the PR 
transmit adaptations from active learning, the CR is able to design a supervised data transmission 
via controlling the interference power levels both to and from the PR. Thus, the CR ensures that 
the resultant performance degradation of the PR is within a tolerable margin, and the CR achievable 
rate is optimized under the "feedback" interference from the PR, which is in general coupled with 
the CR transmit power due to the CR-to-PR interference and the resultant PR power adaptation. 
This paper proposes a new transmission protocol for the CR to implement active learning, together with 
solutions to deal with various important practical issues such as time discrepancy between the PR and 
CR links, CR rate estimation granularity and power measurement noise, and PR/CR channel variations. 
This paper also analyzes the PR and CR jointly achievable rates with the CR supervised transmission. 
Moreover, this paper evaluates the effectiveness of the proposed CR learning and transmission schemes 
when the PR employs different transmit power/rate adaptation schemes over the fading channels Il24l . 

The rest of this paper is organized as follows. Section [II] presents the system model. Section Hill describes 
the hidden PR feedback with different PR transmit adaptation strategies. Section [IV] presents the active 
learning method for the CR to estimate the CR-to-PR channel gain, a protocol to implement this method 
and various solutions to deal with practical issues. Section |V] studies the CR supervised data transmission 
by analyzing the achievable rates of both the PR and CR links. Section [VI] provides numerical examples 
to corroborate the proposed studies. Finally, Section IVIII concludes the paper. 

II. System Model 

As shown in Fig. [H for the purpose of exposition, this paper considers a simplified spectrum sharing 
system, where one CR link consisting of a CR transmitter (CR-Tx) and a CR receiver (CR-Rx) shares 
a narrow-band for transmission with one PR link consisting of a PR transmitter (PR-Tx) and a PR 
receiver (PR-Rx). All the terminals involved are assumed to be each equipped with a single antenna. We 
assume a block-fading channel model for all the channels shown in Fig. [Q We also assume coherent 
communication for both the PR and CR links and thus only the fading channel power gain (amplitude 
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square) is of interest. In addition, since the proposed study in this paper applies to any particular channel 
fading state, for notational brevity, we drop the channel fading state index for the following definitions. 
Denote h c , h p , h cp , and h pc as the power gains of the channels from CR-Tx to CR-Rx, from PR-Tx to 
PR-Rx, from CR-Tx to PR-Rx, and from PR-Tx to CR-Rx, respectively. In addition, denote h pc as the 
channel power gain from PR-Tx to CR-Tx. Without loss of generality, it is assumed that the additive 
noises at both PR-Rx and CR-Rx are independent circularly symmetric complex Gaussian (CSCG) random 
variables with zero mean and variances denoted by a p and of, respectively. 

First, consider the PR link. It is assumed that the PR is oblivious to the existence of the CR and 
treats the interference from CR-Tx as additional noise at the receiver. We assume that the PR employs 
certain form of transmit power and/or rate adaptations based upon the PR CSI as well as the interference 
power level received from the CR. Let N p denote the noise-plus-interference power level at PR-Rx, i.e., 
N p = up 1 + h cp p c , with p c denoting the transmit power of the CR. The PR transmit power, denoted 
by p p , is then given by V p {^ p ), which defines a mapping from the PR "effective" channel power gain, 
7 P = h p /N p , to pp. The PR is assumed to employ packet-based transmissions and the transmit rate of 
one particular packet is denoted by r p . For a given pair of 7 p and p p , r p is assumed equal to 1Z P (SNR P ), 
with SNRp = ^pPp denoting the signal-to-noise (including both the additive noise and CR interference) 
ratio (SNR) at PR-Rx. Note that the rate function 1Z p (SNR p ) is specified by the employed modulation 
and coding scheme (MCS) of the PR link. 

Next, consider the CR link. The CR is assumed to be aware of the PR, and furthermore protect the 
PR transmission by ensuring that the resultant performance loss of the PR due to the CR interference 
is within a tolerable margin. However, we consider a practical scenario where there is no dedicated 
communication channel for the PR to send any side information (e.g., h cp ) to the CR for facilitating its 
interference control to the PR. Consequently, the CR needs to fulfil the task of protecting the PR by its 
own effort. In this case, one possible method for the CR is to deploy spectrum sensing techniques to 
detect the PR on-off status, and then transmit if the sensing result indicates that the PR is not transmitting 
with a high probability (i.e., OSA-based CRs). In contrast, this paper studies more efficient methods for 
the CR to utilize the PR spectrum than sensing-based orthogonal transmission, where the CR manages 
to transmit even when the PR is transmitting over the same band (i.e., SS-based CRs). 



III. Hidden PR Feedback 

In this section, we illustrate the phenomenon of hidden PR feedback. First, consider for the PR link 
the following three commonly adopted power control policies in wireless communication: 
. Constant Power (CP) Policy: V p (^ p ) = Q, V7 P > 0, where Q is a constant; 

• Persistent Power Control Policy: V v {^\ ,) > Vp^p 1 ^), for any < ^ < jp\ 

• Non-Persistent Power Control Policy: V p {^) < V p (jp ), for any < < jjp. 

The CP policy is usually applied when PR-Tx has a strict peak power constraint given by Q over 
all transmitted packets, while the other two policies are applicable when PR-Tx is subject to an average 
power constraint and thus can change transmit powers over different packets. Note that with the persistent 
power control, p p usually increases when the effective channel power gain, 7 P , decreases. This type of 
power control is usually applied for data traffic with a stringent quality-of- service (QoS) requirement in 



terms of receiver SNR, SNR P = ^ p p p . One well-known examp 
control is the so-called truncated channel inversion (TCI) [24] 



e in the literature for the persistent power 
,□ which is expressed as 



otherwise 

where SNR P T ^ is the given SNR target, while 7p is the threshold for 7 P below which the PR decides to 

(T) 

take a "transmit outage", i.e., p p = and thus r p = 0. 7^ can be determined from the PR average transmit 



power constraint and is related to the PR outage probability ||24| (details are omitted here for brevity). 
With the TCI power control, the PR transmits with a constant rate r p = TZ P (SNR P ) if 7 P > 7p . 

In contrast, with the non-persistent power control, the PR usually decreases its transmit power when 
7 P decreases, in order to save transmit powers for better opportunities with larger values of j p . One 
well-known example for the non-persistent power control is the so-called water-filling (WF) ll24ll policy, 
which is given by 

[ otherwise 

Strictly speaking, TCI is non-persistent only for the regime of 7 P > 7^ . Alternatively, TCI is non-persistent for all values of 7 P in the 
special case of 7p T ' = 0, where TCI reduces to the conventional channel inversion power control 1241 . 



where /i is a constant, or the so-called "water-level", which can be determined from the PR average 
transmit power constraint Il24l (details are omitted here). The WF power control results in a variable-rate 
transmission for the PR, where r p = 1Z p {^ p ijl — 1) if 7 P > (l//i); and r p = otherwise. 

From the above discussions, it is observed that p p and/or r p may vary with the values of 7 P . Since 
7 P = h p / (a p + h cp p c ) for a given fading state with fixed channel power gains h p and h cp , it follows that 
7 p is solely determined by transmit power of the CR signal, p c . More specifically, we can express p p and 
r p in terms of p c for CP, TCI, and WF power control of the PR as follows. 

= Q- (3) 



SNR { P{<jl+h cpPc ) 

P p cl ={ hp W ^ " cp (5) 

otherwise. 

r?CI = if (6) 

3 otherwise. 



otherwise. 



otherwise. 
In Fig. |2l p p and r p are plotted as functions of p c , for the CP, TCI (assuming h p > 0^7^), and 
WF (assuming h p > <J p /fx) power control of the PR, respectively. For the purpose of illustration, in 
this example we assume that 1Z P (SNR P ) = log 2 (l + SNR P ), which holds when the optimal Gaussian 
codebook is used by the PR with interference from the CR treated as additive Gaussian noise. As observed, 
by interfering with the PR with p c > 0, the CR is usually able to make the PR change its transmit power 
and/or rate for all considered PR power control policies. As a result, the corresponding changes occur in 
the received PR signal power, h pc p p , and/or rate, r p , at CR-Tx. Therefore, there exists a hidden PR power 
and/or rate feedback observable by the CR, which is activated by the CR via initiatively interfering with 
the PR. In the following, we will apply this hidden PR feedback phenomenon to design new learning 
and transmission schemes for the CR. 
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IV. Active Learning 

In this section, we apply the hidden PR feedback to design CR active learning with the goal of estimating 
the channel power gain from CR-Tx to PR-Rx, h cp , which is essential for the CR to control the interference 
to the PR during data transmission as discussed later in Section |V] First, we present the proposed scheme 
for the ideal case with a number of assumptions made. Then, we present a protocol for the CR to 
implement the proposed scheme and the solutions to deal with important issues for implementation with 
relaxed assumptions. 

A. CR-to-PR Channel Gain Estimation 

In this subsection, we propose a new scheme for CR-Tx to estimate h cp via active learning (i.e., without 
the need of a feedback channel from PR-Rx) under certain assumptions listed as follows. 

• The CR knows the PR transmission protocol and is able to synchronize its operation with the PR 
transmission. 

• In the case where the CR needs to extract rate information from the received PR signal, this can 
be done by the CR via certain techniques. Furthermore, the PR transmit rate, 1Z P (SNR P ), is a 
continuously increasing function of the receiver SNR, SNR P , and this function is known to the CR. 

• In the case where the CR needs to estimate the received signal power from the PR, the effect of the 
receiver noise on the power estimation is ignored. 

• During the period for the proposed scheme to be implemented, all the channels involved in Fig. Q] 
remain constant. 

The above assumptions will be relaxed in the next subsection where implementation issues for the 
proposed scheme are addressed. 

Next, we present the scheme to estimate h cp as follows. Suppose that initially CR-Tx listens to the 
PR transmission^] and observes the received signal power and rate from PR-Tx, represented by = 
h pc p p ^ and r p °' = lZ p {^pp ^), respectively, with pf^ denoting the initial transmit power of the PR and 
7p°- ) = h p / ' o ~p. Next, CR-Tx broadcasts a probing signal of power p c , and PR-Rx reacts upon receiving the 

3 In practice, either CR-Tx or CR-Rx can observe the signal power and/or rate from PR-Tx to estimate h cp using the method presented 
in this paper, while the one between them that has a superior channel quality from PR-Tx is more suitable for this task. For simplicity, this 
paper assumes that this task is done by CR-Tx. 



interference from CR-Tx by sending back to PR-Tx (via a dedicated feedback channel for the PR link) 
a control signal to indicate transmit power and/or rate adaptation. Accordingly, PR-Tx resets transmit 
power and rate to be and rp, respectively, where depends on the employed power control policy 
V p of the PR and = TZpi'jp p P ) with 7^ = h p /(a p + p c h cp ). As a result, CR-Tx observes the 
updated power received from PR-Tx, qp = hpcPp 1 ^, and the updated transmit rate of the PR, Under 
the aforementioned assumptions, q p °\ r p °\ q p l \ and r p are all perfectly observed by CR-Tx. 

Without loss of generality, it can be assumed that in the above proposed scheme, p p ^ > and thus 
> 0. This is so because if p p ^ = 0, the PR does not transmit initially, and thus the CR can 
simply transmit as if the PR is not present and the estimation of h cp becomes unnecessary in this case. 
Furthermore, note that if pf^ > 0, there always exists a non-trivial interval of p c for which p p ^ > 0. This 
is obvious with e.g., CP policy of the PR since p p 1 ^ = Q regardless of p c , while with TCI power control, 
from (OQ) it follows that p p ^ > implies that -^fj > ai and thus pp > provided that p c < (-^rj—crl) / h cp ; 
and with WF power control, from © it follows that p p °^ > implies that fih p > a p and thus pp > 



p 



provided that p c < — . Thus, without loss of generality, we can also assume that q^ > (if not, the 
CR can re-probe the PR with a smaller power p c ). Consequently, > and r p > 0. 

Note that the observed rp contains side information on h cp to be estimated via the term 7^ . However, 
h cp cannot be determined solely from rp since other relevant terms, h p , a 2 , and pp are unknown to 
the CR. Interestingly, CR-Tx can determine h cp /a p from the observed qp°\ r p \ qp and rp, and the 
probing signal power p c , as shown in the following proposition. 

Proposition 4.1: Assuming that q^\ q p l \ and r p are all strictly positive, the channel power gain 
from CR-Tx to PR-Rx h cp normalized to the noise power at PR-Rx a p can be estimated as 

h, _ / ytrgtf; \ 1 

where 7?. p 1 (-) denotes the inverse function of TZ P {-). 
Proof: Since 

Qp ^ _ hpcPp ^ _ Pp _ ,^q, 
Qp ^ hpcPp ^ Pp ^ 
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and from the expressions of r p ' and r p , it 



follows that 



Pp 



Pp 



(0) 




(ii) 



(12) 



(13) 



Using £C1 and £□]), © can be obtained. 



We see that Proposition 14.11 is mainly based upon the "hidden" equation in (fTTI) . which is due to the 
PR transmit self-adaptation upon receiving the interference from the CR. Note that the method given 
in Proposition 14.11 applies to any general PR transmit power/rate adaptation strategy, provided that at 
least one of the PR transmit power and rate is changed after receiving interference from the CR. In the 
two special cases of CP and TCI power control policies for the PR, for which qp = = h pc Q and 
r p = = 1Z P {SN R p T ^), respectively, it easily follows that the estimation rule in © reduces to 



Therefore, only rate/power adaptation of the PR needs to be observed by the CR for the estimation of 
hcp/ap 1 in the case of CP/TCI power control for the PR. 

Note that the proposed new method for the CR to estimate h cp works in both cases of TDD and FDD 
modes for the PR. For comparison, consider the conventional method where CR-Tx estimates h cp from 
the received signal power from PR-Rx (when it transmits), denoted by q p = g pc p P , with g pc denoting the 
channel power gain from PR-Rx to CR-Tx and p p denoting the instantaneous transmit power of PR-Rx. 
In contrast, the proposed method estimates h cp at either CR-Tx or CR-Rx based on the received signals 
from PR-Tx. There are three major advantages of the proposed method over the conventional method. 
First, for the conventional method, even in the case of PR TDD mode where channel reciprocity holds 
such that g pc = h cp , h cp can be estimated only if p p is known at CR-Tx, which may not hold in practice. 
In contrast, from © it is observed that the proposed method does not rely on the knowledge of PR 
transmit power. Second, the assumption g pc = h cp for the conventional method becomes problematic if 




(14) 



(15) 
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FDD mode is used for the PR, since g pc and h cp now correspond to two different frequency bands and 
are thus different in general. In contrast, the proposed method works independent of the relationship 
between g pc and h cp . Third, the conventional method may estimate h cp but cannot give any information 
on the noise power at PR-Rx, cr^; as a result, CR-Tx cannot predict its resulting interference power level 
at PR-Rx relative to a 2 r In contrast, the proposed method provides the direct estimate on h cp /o 2 . 

B. Implementation 

In this subsection, we address various implementation issues for the proposed active learning scheme. 
First, we present the transmission protocols for the PR and CR as follows. 

• PR Transmission Protocol: We consider the conventional pilot-training-based transmission protocol 
for the PR, where the transmission of PR-Tx is divided into orthogonal time blocks, each of which 
is further divided into two sub-blocks: one contains the training signal and the other contains the 
data signal, as shown in Fig. [3£a). The training signal is for PR-Rx to estimate the PR channel h p 
as well as the received noise power N p = + h cp p c (including the received CR interference power 
if p c > 0). It is assumed that these estimates are perfect since in this paper we focus on the deign of 
CR transmission. Based on the estimated h p and N p , PR-Rx computes the effective channel power 
gain 7 P = h p /N p , and according to 7 P designs a feedback signal for PR-Tx to adapt its transmit 
power and/or rate for the next block transmission (for simplicity, we assume that there is no delay 
or error for the PR feedback). 

• CR Transmission Protocol: As shown in Fig. [3£b), the transmission protocol for the CR is more 
sophisticated than the conventional pilot-training-based one for the PR. Specifically, each CR block 
transmission consists of four stages: initial sensing, probing, re-sensing, and data transmission. For 
initial sensing, CR-Tx observes the received PR signal power and/or rate . Then, in the 
probing stage, CR-Tx transmits a predesigned signal of power p c to interfere with PR-Rx. The 
probing signal of CR-Tx can also be used as the training signal for CR-Rx. After that, CR-Tx goes 
into the re-sensing stage to observe the updated PR signal power qp and/or rate r p l \ and estimates 
hep/a 2 according to the rule given in ©. Last, based on the estimated channel and the observed PR 
transmit adaptations, CR-Tx sets its transmit power and rate (details are given later in Section [V]), 
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and starts data transmission. 
Next, we discuss the following important issues for implementing the above CR transmission protocol 
based on active learning. 

1 ) Time Synchronization: One important issue for the proposed scheme is the timing discrepancy 
between the distributed PR and CR links due to the lack of a common reference clock. Let r v , r pc , and 
r cp denote the propagation delays from PR-Tx to PR-Rx, from PR-Tx to CR-Tx, and from CR-Tx to 
PR-Rx, respectively, with r p < (r pc + T cp ). In addition, let s p (t) denote the transmitted signal from PR-Tx. 
Then, the received signals at PR-Rx and CR-Tx are s p (t — t p ) and s p (t — r pc ) (the channel multiplicative 
effect is ignored here since it is irrelevant to the discussion on time synchronization), respectively. Since 
CR-Tx does not have a common clock with PR-Tx, it has to use the received signal from PR-Tx as a 
reference clock. Hence, the transmitted probing signal from CR-Tx can be denoted as s c (t— r pc +A), where 
A > denotes the transmission time ahead of the reference clock (to be specified later). Accordingly, 
the received probing signal at PR-Rx is s c (t — r pc + A — r cp ). Note that CR-Tx needs to make sure that 
its probing signal arrives at PR-Rx prior to the PR training signal in one particular transmission block, 
i.e., r pc — A + r cp < r p , to make an effective probing. Thus, it follows that A > r pc + r cp — r p > 0. 
However, the exact values of t p , r pc , and r cp may not be known to CR-Tx. Instead, suppose that we know 
that the maximum propagation delay between CR and PR terminals is less than r max . Then, by setting 
A = 2r max , it is ensured that the CR probing signal arrives at PR-Rx prior to the PR training signal. 

On the other hand, the duration of the probing signal from CR-Tx, denoted by T c , also needs to be 
properly designed. Note that in order to minimize the temporary performance degradation of the PR link 
due to the CR probing signal, it is desirable to choose a small value for T c . However, for the probing signal 
to be effective, it is also necessary to make T c sufficiently large such that the probing signal can overlap 
with the entire training signal of the PR at PR-Rx in one particular transmission block. Let T p denote the 
training signal duration of the PR, which is assumed known at CR-Tx. From the earlier discussion on 
time synchronization, we know that PR-Rx observes the PR signal, s p (t — t p ), and CR probing signal, 
s c (t — T pc + 2r max — r cp ). Thus, the maximal gap for the arrival time of the CR probing signal ahead of 
that of the PR training signal is 2r max when r p = (r pc + r cp ). Therefore, by setting T c = T p + 2r max , the 
aforementioned requirements for choosing T c are both fulfilled. 
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2) Rate Granularity: In the estimation rule given by ©, it has been assumed that the transmit rate of 
the PR, IZp(SNRp), is a continuous function of receiver SNR, SNR p . However, with practical MCSs, 
1Z P (S N R p ) is usually a non-decreasing function of SNR p with a finite rate granularity, i.e., constituting 
only a finite number of discrete rate values. In this case, suppose that 1Z P (SNR®) = rp, with < 
SNR® < SNR® < SNR®, i = 0, 1, where rf denotes a discrete rate value, and SNR® and SNR® 
are corresponding SNR thresholds. In this case, although the CR cannot determine the exact value of 
hcp/Cp from ©, it can safely estimate the range of this value as 

3) Power Measurement Noise: Another assumption we have made on the estimation using © is 
that the sensor noise at CR-Tx is ignored for estimating the received PR signal powers, and qjp, 
before and after the CR probing. In practice, only a finite number of PR signal samples can be obtained 
during the initial sensing and re-sensing periods at CR-Tx, which are corrupted by the receiver noise. 
For convenience, we assume that the noise power at CR-Tx is a\, the same as that at CR-Rx, and a 2 c is 
known to CR-Tx. Also assume that M independent signal samples are obtained during both the initial 
sensing and re-sensing periods at CR-Tx, denoted by s®(l), . . . , s®(M), i = 0, 1. Specifically, we have 

s(*)( m ) = s W(m) + i/ (i) (m), m = l,...,M (17) 



where s p (m) denotes the PR signal component, with jj Ylm=\ l s p''( m )| 2 — Sp = 0, 1, and i/Wi 



m) s 



are independent Gaussian noises with zero mean and variance of a 2 c . Instead of having the exact values 
for and qp, we can obtain their estimated values as follows. 

1 M 

f = ^El?W| 2 -4i = 0,l. (18) 

m=l 

According to the central limit theorem ll25l . if the number of samples M is large enough (e.g., > 10 

in practice), the above estimation statistics are asymptotically normally distributed with corresponding 
mean 

E(q®) = q®, i = 0,l (19) 

and variance 

eW:=VarW ,) ) = ^2#) |j = M (2Q) 
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Since q p s are unknown at CR-Tx, the exact values of c^'s are not available at CR-Tx. However, if it 
is known that the PR transmit powers must be below a prescribed maximum value, denoted by P max , the 
upper bounds for c^'s can be obtained as 

c« < 2 ^ + 2Pmax) := 5, i = 0, 1. (21) 

Thus, it follows that 

Prob (g« < (g« - (Vt)) < Prob (g« < (g« - C^))) (22) 

= Q(C) (23) 

where Q(-) is the complementary cumulative distribution function ll2~5ll . and ( > is a design parameter. 
Similarly, we have 

Prob ($W > (gf + CVfi) ) < Q(C). (24) 

In other words, we have a belief in probability of at least 1 — Q(() for > (q p ^ — CVcJ and 
qf^ < [qf^ + Cv^V Accordingly, from ©, it follows that with a probability of at least 1 — Q(() 

< V > - 1 I -. (25) 

Similarly, with the same probability guarantee, we have 

-? > 7 r - 1 — ■ (26) 

Note that in (|25l ) and (|26l) , we have assumed that g^ > and > respectively. Thus, even 
with a finite number of observation samples corrupted by additive noises, CR-Tx can still obtain a pair 
of upper and lower bounds on h cp / of with a large belief probability (by setting a sufficiently large value 
for (). However, if the chosen ( is too large, it also increases the uncertainty range for the estimation. 

4) Channel Variation: Last, we address the issue on possible channel variations during the implemen- 
tation of the proposed CR active learning scheme. It is worth noting that the assumption of constant 
channels has usually been made in prior works (see, e.g., ETTl . [|22l . 11231 ) on iterative user power/rate 




adaptations in decentralized multiuser systems. From the proof of Proposition 14.11 we see that if the 
channel power gain, h pc , through which CR-Tx estimates the received signal powers q p °^ and q^ from 
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PR-Tx, changes from the initial sensing stage to the re- sensing stage, the estimation result will get 
affected. Let h p °) and hpc denote the true values of h pc during the initial sensing and re-sensing periods, 
respectively. We can rewrite the estimation rule in © as (assuming the perfect rate and power estimation) 



for their ratio given the channel coherence time relative to the time interval between the initial sensing 
and re-seining stages, and obtain the corresponding upper and lower bounds on the estimated value from 
dTTT) . Furthermore, the channel power gain h cp from CR-Tx to PR-Rx may also change from the probing 
stage to the data transmission stage. Similarly as for h pc , given the channel coherence time and the time 
interval between these two stages, CR-Tx can estimate the range of h cp accordingly. 



In the previous section, we have proposed an active learning scheme for the CR to estimate the channel 
gain from CR-Tx to PR-Rx by exploiting the hidden PR feedback. In this section, we design supervised 
transmission for CR data transmission stage shown in Fig. Ob), based on the acquired knowledge from 
active learning. In the following, we address two main design objectives for CR supervised transmission: 
controlling the PR link performance degradation and maximizing the CR link throughput. 

A. PR Performance Loss Control 

In this subsection, we illustrate how to apply the estimated CR-to-PR channel gain from active learning 
for CR-Tx to predict the performance loss of the PR link due to CR data transmission. For simplicity, 
we assume that the estimation of h cp /ap is perfect at CR-Tx, although the obtained results can be easily 
extended to the case of imperfect channel estimation by utilizing the derived estimation bounds in Section 
IIV-BI We consider two general types of performance losses for the PR link: One is for the case where 
the PR employs variable-rate transmission (e.g., with CP or WF power control), named as rate penalty, 
which measures the PR rate loss due to the CR interference, expressed as Ri = jp — r p d \ where jp 
denotes the resultant PR transmit rate in the CR data transmission stage; the other is for the case where 
the PR employs constant-rate transmission (e.g., with TCI power control), named as power penalty, which 




(27) 



Although CR-Tx does not know the exact values of h pc and h 



it can predict the approximate range 



V. Supervised Transmission 
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measures the additional transmit power in dB required for the PR to maintain the prescribed constant rate 
under the CR interference, expressed as Pi = 10 x log 10 (pp /Pp^)-, where pf^ denotes the resultant 
PR transmit power in the CR data transmission stage. Note that j-p and p p ^ denote the PR transmit rate 
and power without the CR interference, respectively, in the CR initial sensing stage. Let pc denote the 
CR transmit power in the data transmission stage. 

First, the rate penalty for the PR link can be more explicitly expressed as 

Note that for the convenience of analysis, we have assumed the "SNR gap approximation" that accounts 
for the rate loss from the optimal capacity due to practical/non-Gaussian MCS employed by the PR [|26ll , 
i.e., TZp(SNRp) = log 2 (l + SNR P /T P ), where T p > 1 denotes the constant SNR gap for the PR. 
In the case of CP policy for the PR, from (1281) it follows that 

fl c^ log2 ( 1 + M)_ log2 ( 1 + _^, 

i _i_ h<pC 
1 ~r t Z2 



<i og2 ,i + M)_ log2 (_^, (30 ) 



log 2 |l + ^-|. (3D 



Therefore, CR-Tx knows that if it transmits with power pf\ the resultant rate loss of the PR is upper- 
bounded by the value given in (1311) , which depends on the estimated h cp ja^ but is independent of the 
PR transmit power Q and SNR gap T p . 

Consider next the case of WF power control for the PR similarly as that given in © but with 7 P 
therein replaced by 7 P /r p . In this case, assuming that > (otherwise the rate penalty for the PR is 
trivially zero), from (|28l) Ri can be further expressed as 



It thus follows that 



hcp'Pc \ -f _W ^ v v a y _ 2 r P -1 



log 2 (l + ^f) if J#> < 



RT F ={ V * J Tf • (33) 

rf^ otherwise 
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Thus, CR-Tx can predict the exact rate loss of the PR as a function of p£ , based on the estimated h cp /<jp 
and from the active learning. 

Last, consider the power penalty of the PR with the TCI power control given in ©. Assuming that 
7p = > 0, i.e., the CR interference power is not sufficiently large to render the PR into a transmit 
outage (otherwise the power penalty of the PR becomes irrelevant), it thus follows that 



P, TCI = 10 x log 10 ( 1 + ) . (34) 



Thus, CR-Tx can measure the power penalty of the PR as a function of p^\ 

From the above discussions, we see that the derived rate and power penalties enable CR-Tx to predict 
quantitatively the resultant PR performance losses corresponding to different transmit power levels of the 
CR, using only the observed/estimated parameters from the active learning. 

B. CR Achievable Rate 

In the previous subsection, we have shown for the CR supervised transmission how to control the 
resultant PR link performance degradation. With a given PR rate/power penalty, CR-Tx can derive 
accordingly the maximum tolerable transmit power pc ■ In this subsection, we analyze the CR link 
achievable rate as a function of pc . Due to the space limitation, we consider only the case of single-user 
detection at CR-Rx for decoding the CR message, by treating the interference from PR-Tx as additive 
noise. However, it is worth noting that more advanced multiuser detection techniques can be employed 
at CR-Rx to decode both the CR and PR messages in order to suppress the PR interference (details are 
omitted here; the interested readers may refer to a preliminary version of this paper Il27l0 . 

With single-user detection, the achievable rate of the CR link in the data transmission stage can be 
expressed as 

r«> = log 2 ( 1 + kdd) 1 (35) 

V r c [al + h pcVp d) ) ) 

where T c > 1 denotes the SNR gap for the CR, and 

with V p denoting the PR employed power control policy (e.g., CR, TCI, or WF). It is interesting to 
observe that in general the CR achievable rate is related to the CR transmit power pf 1 not only through 
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the direct link from CR-Tx to CR-Rx, but also through the interference link from CR-Tx to PR-Rx, the 
resultant PR power adaptation and "feedback" interference from PR-Tx to CR-Rx. Thus, CR-Tx is able to 
control the interference power from PR-Tx by changing transmit power p^f 1 via the hidden PR feedback. 

With the PR feedback interference, some interesting observations can be drawn for the CR achievable 
rate as a function of pf 1 . Note that without the PR interference, ri is an increasing function of pi d \ 
However, with the PR feedback interference, the interference power from PR-Tx can also be an increasing 
function of pi in the case of persistent power control for the PR (e.g., TCI). As a result, it is unclear 
in this case whether increasing the CR transmit power will result in a net gain for its achievable rate. 
Thus, it is pertinent to investigate further on rc for the CR link under the PR feedback interference, as 
shown in the following proposition. 

Proposition 5.1: For any pc > under which VpH p ) with 7„ = hp m is a positive, continuous 

" Tp+hcpPc 

and differentiable function of 7„, > if and only if — > 0, where 



The proof of Proposition 15. II follows from (|35l) and is thus omitted here for brevity. It is noted that CP 
and WF power control policies for the PR satisfy the condition given in Proposition 15. 1 1 straightforwardly, 
since they are both non-persistent power control. For the TCI power control of the CR which is persistent, 

flFfn ) (d) 

it can be verified (details are omitted here for brevity) that — -%r > 0, for all values of p c > as 

dpi 

required in Proposition 15.11 It thus follows that rc is a strictly increasing function of p^f 1 in all cases 
of CP, WF, or TCI power control policies for the PR. 



In this section, we present numerical examples to validate the effectiveness of our proposed schemes for 
CR active learning and supervised transmission. It is assumed that h p = h c = h pc = 1 and h cp = h pc = 0.5 
in Fig. [TJ For simplicity, we assume that all these channels are constant over the PR and CR transmission 
blocks where the proposed CR schemes are implemented. We evaluate the performance for the CR-to-PR 
channel gain estimation based on active learning, as well as the PR performance degradation control and 
CR achievable rate with CR supervised transmission. We consider the following two scenarios: Case 




(37) 



VI. Numerical Examples 
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/: the PR employs a constant-power (CP) variable-rate transmission; and Case II: the PR employs a 
constant-rate variable-power (with TCI power control) transmission. For convenience, we assume that 

a p = a l = i> an d r c = 1. 

Consider first Case I, where the PR transmits with a constant power Q = 100. In this case, we are 
interested in investigating the effects of finite rate granularity for the PR variable-rate transmission on 
the performances of the CR active learning and supervised transmission. Suppose that the PR transmit 
rate for a given effective channel gain 7 P is expressed as 

1pQ\ i 



log 2 1 + 



T p J b 



(38) 



in bps/Hz, where |_-J denotes the floor operation; and b > denotes the "bit granularity" due to the 
fact that practical MCS only supports a finite set of discrete transmit rates corresponding to integer 
multiplications of b. We assume that T p = 3dB and 6 = 1 (i.e., one-bit granularity). From (fT6l) . it follows 
that the upper and lower bounds on h cp in the case of one-bit granularity are obtained as 



(11 1 — C P — I (11 

2 r P +1 - 1 Pc \ 1 r v - 1 /P. 



c 



where rf^ and denote the discrete rates of the PR observed by the CR in the sensing and re-sensing 
stages, respectively. In Fig. @Ja), we show the estimated upper and lower bounds for h cp using the above 
estimation rule. It is observed that with small value of CR probing signal power p c , the gap between the 
estimated upper and lower bounds for h cp is large, suggesting that the estimation of h cp is not accurate. 
This is due to the fact that if p c is too small, the interference at PR-Rx is not sufficiently strong to make 
the PR reduce its transmit rate by at least one bit (Note that b = 1), and as a result, the CR observes the 
same value of as rf \ However, with larger value of p c , the CR is able to make < and thus 
obtain a more accurate estimation for h cp . Thus, there is in general a tradeoff between minimizing the 
PR performance degradation and the CR-to-PR channel estimation error for the CR active learning. In 
Fig. @Ib) and HJc), we show the PR rate penalty and CR achievable rate, respectively, vs. CR transmit 
power p c d ^ for CR supervised data transmission. It is observed that both the PR rate penalty and CR 
transmit rate increase with p^f 1 . Moreover, in Fig. BJb), we compare the actual resultant PR rate penalty 
(with one-bit granularity) to its estimated value using (I3TI) and the estimated upper bound on h cp from 
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active learning with p c = 10. It is observed that the estimated PR rate penalties are indeed valid upper 
bounds on their true values for different values of p^ . 

It is worth comparing the spectrum-sharing performance for the PR and CR links with the proposed 
active learning and supervised transmission for the CR, with the approach (refereed to as "No Feedback") 
without exploiting the PR hidden feedback, or the approach (refereed to as "Perfect Feedback") with the 
perfect knowledge of the CR-to-PR channel via a dedicated feedback channel from PR-Rx to CR-Tx. 
Note that for all three design approaches, the achievable rates for the CR with a given transmit power pf^ 
are identical, as shown in Fig. He). However, the main differences among these designs are highlighted 
as follows. For the case of "No Feedback", the CR has no means to predict the PR performance loss 
as a function of pc and thus cannot deploy any opportunistic transmission; as a result, the CR has to 
transmit constantly with a very low power and thus results in low spectral efficiency. In contrast, with the 
new proposed design, the CR can always predict its maximum transmit power given the PR transmission 
margin and decide its transmit rate accordingly. On the other hand, for the case of "Perfect Feedback", 
as shown in Fig. @Jb), for a given PR rate penalty value, the CR with the perfect channel knowledge can 
transmit with a larger power than the proposed design with active learning based channel estimation, and 
thus the maximum achievable rate for the CR also becomes larger (cf. Fig. HJb) & HJc)). 

Next, consider Case II, where the PR transmits with a constant rate or equivalently maintains a constant 
receiver SNR, SNRp 7 "* = 10. Thus, the TCI power control given in CO) is used by the PR with 7^ = 0.1. 
In this case, we are interested in investigating the effects of a finite number of observation samples and 
receiver noise at CR-Tx for estimating the received PR signal powers on the performances of CR active 
learning and supervised transmission. From (|25l) and (|26l) , it follows that the upper and lower bounds on 
h cp in the case of a finite number of observed PR signal samples are obtained as 



where q p ' and q p ' denote the observed powers at CR-Tx in the sensing and re-sensing stages, respectively. 
In order to keep the estimated h cp within the above range with a probability guarantee of 99%, we choose 
( = 2.3 since Q(2.3) 0.01. Furthermore, we set P max = 100 and M = 500 for determining the constant 
c defined in (f2TT) . In Fig. Oa), we show the estimated upper and lower bounds for h cp using the above 




(40) 
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rule. Similar to our previous observations for Fig. St a), it is observed that the CR probing power p c needs 
to be sufficiently large in order to make a reasonably good estimate on h cp . In Fig. Ob) and[5|c), we show 
the PR power penalty and CR achievable rate, respectively, vs. CR transmit power p^ for CR supervised 
data transmission. It is observed that both the PR power penalty and CR transmit rate increase with p^K 
Moreover, in Fig. Ob), we compare the actual PR power penalty to its estimated value using (|34|) and 
the estimated upper bound on h cp from active learning with p c = 10. It is observed that the estimated 
PR power penalties are valid upper bounds on the true values, which become tighter for smaller values 
of pc . Comparing the CR achievable rates in Fig. @Jc) and Fig. Oc), it is observed that the CR rate 
increase with pc is much slower in the latter than the former case. This is because for Fig. [5£c), the PR 
employs TCI power control instead of CP as for Fig. HJc), and thus the PR feedback interference power 
at CR-Rx increases with pc instead of being a constant as for the case of Fig. Hfc) with CP. 

VII. Conclusion 

This paper introduces a new design paradigm for spectrum sharing based CRs, where the CR designs 
its learning and transmission from the observed PR transmit power/rate adaptations upon receiving a 
probing signal from the CR, namely the hidden PR feedback. First, a novel active learning scheme 
is proposed for the CR to estimate the channel gain from its transmitter to the PR receiver, which is 
essential for the CR interference control to the PR. Second, with the acquired channel knowledge and PR 
transmit adaptations from active learning, the CR supervised data transmission is designed by effectively 
controlling the performance degradation of the PR as a function of the CR transmit power. Moreover, this 
paper shows that the CR is able to predict its own achievable rate under the PR feedback interference, 
which is coupled with the CR transmit power via the hidden PR feedback. This paper presents a new 
transmission protocol for the CR to implement the proposed learning and transmission schemes, and 
proposes the solutions to deal with various important practical issues. The results in this paper provide a 
new promising approach to interference management for decentralized multiuser communication systems. 
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Fig. 1. Spectrum sharing between a PR link and a CR link. 





Fig. 2. Plots of p p and r p as functions of p c for (a) CP; (b) TCI; and (c) WF power control of the PR. 
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Fig. 3. Transmission protocols for (a) the PR; and (b) the CR. 
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Fig. 4. Performance of CR active learning and supervised transmission when PR employs constant-power variable-rate transmission (Case 
I): (a) CR-to-PR channel power gain estimation; (b) PR rate penalty; and (c) CR achievable rate. 
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Fig. 5. Performance of CR active learning and supervised transmission when PR employs constant-rate variable-power transmission (Case 
II): (a) CR-to-PR channel power gain estimation; (b) PR power penalty; and (c) CR achievable rate. 



